Singing Voice Separation with Deep U-Net Convolutional Networks
نویسندگان
چکیده
The decomposition of a music audio signal into its vocal and backing track components is analogous to image-toimage translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction. Through both quantitative evaluation and subjective assessment, experiments demonstrate that the proposed algorithm achieves state-of-the-art performance.
منابع مشابه
Learning to Pinpoint Singing Voice from Weakly Labeled Examples
Building an instrument detector usually requires temporally accurate ground truth that is expensive to create. However, song-wise information on the presence of instruments is often easily available. In this work, we investigate how well we can train a singing voice detection system merely from song-wise annotations of vocal presence. Using convolutional neural networks, multipleinstance learni...
متن کاملClassification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks
Singing melody extraction is the task that identifies the melody pitch contour of singing 1 voice from polyphonic music. Most of the traditional melody extraction algorithms are based on 2 calculating salient pitch candidates or separating the melody source from the mixture. Recently, 3 classification-based approach based on deep learning has drawn much attentions. In this paper, 4 we present a...
متن کاملSinging Voice Separation Using Deep Neural Networks and F0 Estimation
Deep Neural Networks (DNN) have become a popular approach for speech enhancement, and singing voice separation. DNNs are typically trained to estimate a timefrequency mask using ground truth examples. In this submission, we combine DNN estimation as a first step with traditional refinement via F0 estimation, using the YINFFT algorithm.
متن کاملSinging-voice Separation Using Deep Recurrent Neural Networks
In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. We propose jointly optimizing the networks for multiple source signals by including the separation step as a nonlinear operation in the last layer. Discriminative training objectives are further explored to enhance the source to interference ratio. The al...
متن کاملDeep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of vo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017